1.异方差问题

如果随机误差序列的方差会随着时间的变化而变化,这种情况被称为异方差。

\[Var(\epsilon_t)=h(t)\]

异方差的影响:

异方差虽然不会影响回归系数最小二乘估计的无偏性,但会影响回归系数估计的标准差和置信区间。 忽视异方差的存在会导致残差的方差会被严重低估,使得参数的显著性检验失去意义,最终导致模型的拟合精度受影响。

  • 异方差的直观判断

    • 残差图(方差齐性残差图,递增型残差图)

    • 残差平方图:残差序列的方差实际上是它平方的期望 ,所以考察残差序列是 否满足方差齐性,主要考察平方序列是否平稳。

  • 已知异方差形式:使用方差齐性变换 对于标准差与均值成正比关系的异方差序列,对数变换可以有效地实现方差齐性。 图示出现异方差现象,先尝试log看是否能够解决。

  • 未知异方差形式:建立异方差模型。

2. ARCH模型

2.1 ARCH模型的思想

基本思想是在以前信息集下,在某一时刻一个噪声的发生是服从正态分布。该正态分布的均值为零,方差为一个随着时间变化而改变的量,即为条件异方差,并且这个随时间变化的方差是过去有限项噪声值平方的线性组合。这样就构成了自回归条件异方差模型。

2.2 ARCH模型的结构

\[ ARCH(q)\left\{ \begin{aligned} x & = \beta^T(1,x_{t-1},x_{t-2},...)+\epsilon_t \\ \epsilon_t & = \sqrt{h_t}e_t, e_t \sim N(0,1) \\ h_t & = a_0+a_1\epsilon_{t-1}^2+...+a_q\epsilon_{t-q}^2 \end{aligned} \right. \] 应当满足两个约束条件

\[ \left\{ \begin{aligned} a_0>0,\\ 0 <= a_i <1,i =1,2,...,q \\ \sum_{i=1}^qa_i<1 \end{aligned} \right. \]

  • 条件一:方差必须非负,即\(Var(\epsilon_t|\epsilon_{t-1},\epsilon_{t-2},...)>=0\)\(a_0+a_1\epsilon_{t-1}^2+...+a_q\epsilon_{t-q}^2 >=0\)故每个参数要求非负。
  • 条件二:方差平稳

2.3 ARCH模型的优缺点

  • 优点 突破了传统时间序列模型中同方差的假设;将条件方差表达成过去干扰项的回归函数, 能够反映集聚效应;序列存在ARCH效应的时候,直接使用OLS估计会产生偏差,使用 模型能够一定程度上避免此偏差,提高参数的估计精度。

  • 缺点 在实际应用中为了得到更好的结果,ARCH模型的阶数往往很大,参数过多,估计困 难;条件方法假设为线性函数,在现实中线性情况可能只是特例。

3.GARCH模型

3.1 模型的思想

将条件方差本身的滞后值加入\(h_t\)。即ARCH模型实际上就是\(\epsilon_t^2,\epsilon_{t-1}^2,...,\epsilon_{t-q}^2\)的q阶移动平均的MA(q)模型。而GARCH模型实际上就是\(h_t\) 关于\(h_{t-1},...,h_{t-p}\)的p阶自相关,关于\(\epsilon_t^2,\epsilon_{t-1}^2,...,\epsilon_{t-q}^2\)的q阶移动平均的ARMA(p,q)模型。

3.2 模型结构:

\[ GARCH\left\{ \begin{aligned} x_t & = f(t,x_{t-1},x_{t-2},...)+\epsilon_t \\ \epsilon_t & = \sqrt{h_t}e_t, e_t \sim N(0,1) \\ h_t & = a_0+\sum_{j=1}^p\eta_jh_{t-j}+\sum_{i=1}^q\lambda_i\epsilon_{t-i}^2 \end{aligned} \right. \] 参数满足下面的约束条件: \[ \left\{ \begin{aligned} 0<= \lambda_i<1, i=1,2,...,q\\ 0 <= \eta_j <1,j =1,2,...,q \\ 0 <= \sum_{j=1}^p\eta_j+\sum_{j=1}^q\lambda_j<1 \end{aligned} \right. \] 注: (1)\(e_t\)为原序列提取确定信息和条件异方差信息之后的残差波动,所以\(e_t\)应该是真正的 白噪声序列。\(e_t\)还有一个重要的作用,就是根据\(e_t\)的特征确定序列的分布类型。通常 假定\(e_t\)服从正态分布,如果对\(e_t\)的分布检验显示,残差序列显著拒绝正态分布假定,就需要 根据\(e_t\)的分布特征尝试其他分布。

一个完整的条件异方差模型由:均值模型,条件异方差模型和分布假定三块组成。

4.建模流程

同时关注序列信息和序列波动,首先提取序列的均值信息,然后分析残差序列中蕴含的 波动信息

  • 图示法

观察适合何种模型建模,是否具有异方差性的可能。

  • 线性时序建模(ARIMA, 回归拟合+残差自相关)

构建水平模型\(x_t= f(t,x_{t-1},x_{t-2},...)+\epsilon_t\) ,提取序列均值中蕴涵的相关信息. 常用ARIMA模型。建模前需要先对序列的平稳性进行检验,PP检验适用于异方差场合,故怀疑异方差的场合要重视PP检验。

建立水平模型后,检验模型残差是否通过白噪声检验,若通过,查看模型残差平方是否通过白噪声检验,若不通过,说明有异方差性,考虑继续建立ARCH/GARCH; 若一开始的残差就不通过白噪声检验,改进线性模型。

  • 异方差自相关检验(LM test)
  • 模型定阶
    • ARCH: 看acf,pacf
    • GARCH: 通常尝试(1,1),(1,2),(2,1),(2,2)
  • 参数估计
  • 模型检验
    • 参数显著性检验:GARCH模型拟合完成后,首先对参数是否显著大于零进行检验,参 数显著性检验和ARMA模型相同,构建T分布统计量,在给定显著性水平下,拒绝原假 设表明该参数显著非零。

    • 模型显著性检验:条件异方差模型拟合的效果取决于它是否将残差序列中蕴涵的异方差 信息充分提取出来。利用拟合模型估计出来的异方差 ,对残差序列和残差平方序列分 别进行标准化变换,

    • 分布检验:\(e_t\)序列的正态性检验 在构造GARCH模型的时候,如果不特殊指定,通常默认该序列服从正态分布。故此检 验的内容之一就是检验这个分布假定是否正确。

  • 模型预测

4.案例1973-2009Intel公司股票的对数收益率。

step1.时序图

0均值,无季节,趋势,所以直接拟合残差即可。

library(readr)
library(xts)
d.intel <- read_table2(
  "~/Downloads/ftsdata/m-intcsp7309.txt",
  col_types=cols(
    .default=col_double(),
    date=col_date("%Y%m%d")
  ))
xts.intel <- xts(
  log(1 + d.intel[["intc"]]), d.intel[["date"]]
)
tclass(xts.intel) <- "yearmon"
ts.intel <- ts(c(coredata(xts.intel)), start=c(1973,1), frequency=12)
at <- ts.intel - mean(ts.intel)
ts.plot(at)

acf(at) # 可以认为是白噪声

序列的均值模型是常数均值,减去均值后的残差的平方的ACF:

2.ARCH效应检验

  • Ljung-Box白噪声检验

\(a_t\),\(a_t^2\)做白噪声检验

Box.test(at,lag=12,type='Ljung') #通过白噪声检验,说明线性模型拟合OK
## 
##  Box-Ljung test
## 
## data:  at
## X-squared = 18.676, df = 12, p-value = 0.09665
Box.test(at^2,lag=12,type='Ljung') #不通过白噪声检验
## 
##  Box-Ljung test
## 
## data:  at^2
## X-squared = 92.939, df = 12, p-value = 1.332e-14
  • 统计检验方法

    • Portmantea Q检验

    • Engle拉格朗日乘子法检验(LM检验)

"archTest" <- function(x, m=10){
  # Perform Lagrange Multiplier Test for ARCH effect of a time series
  # x: time series, residual of mean equation
  # m: selected AR order
  y <- (x - mean(x))^2
  T <- length(x)
  atsq <- y[(m+1):T]
  xmat <- matrix(0, T-m, m)
  for (j in 1:m){
    xmat[,j] <- y[(m+1-j):(T-j)]
  }
  lmres <- lm(atsq ~ xmat)
  summary(lmres)
}
archTest(at,m=12)
## 
## Call:
## lm(formula = atsq ~ xmat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.07440 -0.01153 -0.00658  0.00395  0.35255 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.005977   0.002249   2.658 0.008162 ** 
## xmat1        0.093817   0.048147   1.949 0.052013 .  
## xmat2        0.153085   0.048102   3.183 0.001569 ** 
## xmat3        0.146087   0.048614   3.005 0.002815 ** 
## xmat4        0.023539   0.049126   0.479 0.632075    
## xmat5        0.007347   0.049107   0.150 0.881139    
## xmat6        0.010342   0.047027   0.220 0.826050    
## xmat7        0.057183   0.047027   1.216 0.224681    
## xmat8        0.014320   0.047079   0.304 0.761149    
## xmat9        0.007157   0.046968   0.152 0.878965    
## xmat10      -0.019742   0.046566  -0.424 0.671810    
## xmat11      -0.057537   0.046041  -1.250 0.212116    
## xmat12       0.161945   0.045965   3.523 0.000473 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03365 on 419 degrees of freedom
## Multiple R-squared:  0.1248, Adjusted R-squared:  0.0997 
## F-statistic: 4.978 on 12 and 419 DF,  p-value: 9.742e-08

检验的p值为 9.742e-08, 高度显著,说明有ARCH效应。

3.模型建立

ARCH模型可以用acf,pacf初步判断阶数

acf(at^2,lag.max=36,main='')

#在12处较高,另外在之后1,2,3上较突出
#残差平方和的pacf
pacf(at^2,lag.max=36,main="")

残差平方的PACF在滞后12处较高, 另外在滞后1到3较高。 考虑建立ARCH(3)作为波动率方程。

\(r_t\)为收益率,拟建立如下的均值方程和波动率方程: \[r_t=\mu+a_t,a_t=\epsilon_t\sigma_t, \epsilon_t i.i.d \sim N(0,1)\]

\[\sigma_t^2=\alpha_0+\alpha_1a_{t-1}^2+\alpha_2a_{t-2}^2+\alpha_3a_{t-3}^2\] 通过构造残差平方序列的自回归模型来拟合异方差函数\(\sigma_t^2\)

使用fGarch包的garchFit()函数建立ARCH模型。

library(fGarch)
mod1 <- garchFit( ~ 1 + garch(3,0), data=c(ts.intel), trace=FALSE)
#其中1表示均值方程是一个常数,
#输出结果中mu表示均值方程的均值,omega表示alpha0,alpha1为alpha1
summary(mod1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~1 + garch(3, 0), data = c(ts.intel), trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ 1 + garch(3, 0)
## <environment: 0x7ff74ae972b0>
##  [data = c(ts.intel)]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##       mu     omega    alpha1    alpha2    alpha3  
## 0.012567  0.010421  0.232889  0.075069  0.051994  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu      0.012567    0.005515    2.279   0.0227 *  
## omega   0.010421    0.001238    8.418   <2e-16 ***
## alpha1  0.232889    0.111541    2.088   0.0368 *  
## alpha2  0.075069    0.047305    1.587   0.1125    
## alpha3  0.051994    0.045139    1.152   0.2494    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  303.9607    normalized:  0.6845963 
## 
## Description:
##  Mon Mar  8 21:14:33 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  203.362   0           
##  Shapiro-Wilk Test  R    W      0.9635971 4.898647e-09
##  Ljung-Box Test     R    Q(10)  9.260782  0.5075463   
##  Ljung-Box Test     R    Q(15)  19.36748  0.1975619   
##  Ljung-Box Test     R    Q(20)  20.46983  0.4289059   
##  Ljung-Box Test     R^2  Q(10)  7.322136  0.6947234   
##  Ljung-Box Test     R^2  Q(15)  27.41532  0.02552908  
##  Ljung-Box Test     R^2  Q(20)  28.15113  0.1058698   
##  LM Arch Test       R    TR^2   25.23347  0.01375447  
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -1.346670 -1.300546 -1.346920 -1.328481

得到的均值方程和波动率方程为: \[r_t=0.0126+a_t,a_t=\epsilon_t\sigma_t\] \[\sigma_t^2=0.010421+0.232889a_{t-1}^2+0.075069a_{t-2}^2+0.051994a_{t-3}^2\] 因为结果中\(\alpha_2\)\(\alpha_3\)的估计值是不显著的,可以拟合ARCH(1)模型为波动率方程

mod2 <- garchFit( ~ 1 + garch(1,0), data=c(ts.intel), trace=FALSE)
summary(mod2)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~1 + garch(1, 0), data = c(ts.intel), trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ 1 + garch(1, 0)
## <environment: 0x7ff74b9e22c0>
##  [data = c(ts.intel)]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##       mu     omega    alpha1  
## 0.013130  0.011046  0.374976  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu      0.013130    0.005318    2.469  0.01355 *  
## omega   0.011046    0.001196    9.238  < 2e-16 ***
## alpha1  0.374976    0.112620    3.330  0.00087 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  299.9247    normalized:  0.675506 
## 
## Description:
##  Mon Mar  8 21:14:33 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  144.3783  0           
##  Shapiro-Wilk Test  R    W      0.9678175 2.670321e-08
##  Ljung-Box Test     R    Q(10)  12.12248  0.2769429   
##  Ljung-Box Test     R    Q(15)  22.30705  0.1000019   
##  Ljung-Box Test     R    Q(20)  24.33412  0.2281016   
##  Ljung-Box Test     R^2  Q(10)  16.57807  0.08423723  
##  Ljung-Box Test     R^2  Q(15)  37.44349  0.001089733 
##  Ljung-Box Test     R^2  Q(20)  38.81395  0.007031558 
##  LM Arch Test       R    TR^2   27.32897  0.006926821 
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -1.337499 -1.309824 -1.337589 -1.326585

这个模型的aic比上个模型差一些。

为了进行模型验证,可以计算标准化残差\(\tilde \alpha_t = \frac{a_t}{\sigma_t}\) 计算标准化残差\(\tilde \alpha_t\)

resi <- residuals(mod2, standardize=TRUE)

#标准化残差的时序图
plot(ts(resi, start=start(ts.intel), frequency=frequency(ts.intel)), 
     xlab="年", ylab="标准化残差")

#标准化残差的acf
acf(resi, lag.max=36, main="")

acf(resi^2, lag.max=36, main="")

#除了在滞后11、滞后21还略高以外已经没有了低阶的波动率相关。
pacf(resi^2, lag.max=36, main="")

#仅在高阶的滞后11、滞后21还较高。 作为低阶模型, ARCH(1)作为波动率方程已经比较适合。

检验标准化残差的分布。

shapiro.test(resi)  #非正态
## 
##  Shapiro-Wilk normality test
## 
## data:  resi
## W = 0.96782, p-value = 2.67e-08
plot(density(resi)) #考虑t分布建模,可以比较aic

#garchFit中更改参数cond.dist="std"

另:fGarch包对建模见过自带若干诊断图

#plot(mod2)

获得拟合的均值用fitter()函数,

mod2f <- fitted(mod2)
head(mod2f)
##            1            2            3            4            5            6 
##  0.009999835 -0.150012753  0.067064079  0.082948635 -0.110348491  0.125162849

用predict函数做超前若干步的预测

mod2p <- predict(mod2, n.ahead=12)
mod2p
##    meanForecast meanError standardDeviation
## 1    0.01313003 0.1090513         0.1090513
## 2    0.01313003 0.1245215         0.1245215
## 3    0.01313003 0.1298482         0.1298482
## 4    0.01313003 0.1317901         0.1317901
## 5    0.01313003 0.1325109         0.1325109
## 6    0.01313003 0.1327802         0.1327802
## 7    0.01313003 0.1328811         0.1328811
## 8    0.01313003 0.1329188         0.1329188
## 9    0.01313003 0.1329330         0.1329330
## 10   0.01313003 0.1329383         0.1329383
## 11   0.01313003 0.1329403         0.1329403
## 12   0.01313003 0.1329411         0.1329411

预测包括均值的预测(显然是用\(\mu\)预测的)、 均值预测的标准误差、 波动率的预测(是用的值滚动计算的)。 波动率长期预测接近于ARCH模型的无条件标准差。

GARCH模型

ARCH模型用来描述波动率能得到很好的效果, 但实际建模时可能需要较高的阶数, 比如§17.5.3的欧元汇率波动率建模用了11阶的ARCH模型。

GARCH模型的定阶方法研究不多, 一般用试错法尝试较低阶的GARCH模型, 如GARCH(1,1), GARCH(2,1), GARCH(1,2)等。 许多情况下GARCH(1,1)就能解决问题。

继续用上面的数据建模

library(fGarch, quietly = TRUE)
mod1 <- garchFit(~ 1 + garch(1,1), data=ts.intel, trace=FALSE)
summary(mod1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~1 + garch(1, 1), data = ts.intel, trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ 1 + garch(1, 1)
## <environment: 0x7ff74a45c030>
##  [data = ts.intel]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1       beta1  
## 0.01126568  0.00091902  0.08643831  0.85258554  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     0.0112657   0.0053931    2.089  0.03672 *  
## omega  0.0009190   0.0003888    2.364  0.01808 *  
## alpha1 0.0864383   0.0265439    3.256  0.00113 ** 
## beta1  0.8525855   0.0394322   21.622  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  312.3307    normalized:  0.7034475 
## 
## Description:
##  Mon Mar  8 21:14:34 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  174.904   0           
##  Shapiro-Wilk Test  R    W      0.9709615 1.030282e-07
##  Ljung-Box Test     R    Q(10)  8.016844  0.6271916   
##  Ljung-Box Test     R    Q(15)  15.5006   0.4159946   
##  Ljung-Box Test     R    Q(20)  16.41549  0.6905368   
##  Ljung-Box Test     R^2  Q(10)  0.8746345 0.9999072   
##  Ljung-Box Test     R^2  Q(15)  11.35935  0.7267295   
##  Ljung-Box Test     R^2  Q(20)  12.55994  0.8954573   
##  LM Arch Test       R    TR^2   10.51401  0.5709617   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -1.388877 -1.351978 -1.389037 -1.374326

对标准化残差及其平方的白噪声检验都通过了。而且AIC值更好。条件分布的正太性检验仍然不通过。 模型可以写成: \[r_t = 0.0113 + a_t, a_t = \sigma_t\epsilon_t, \epsilon_t i.i.d. \sim N(0,1)\]

\[\sigma_t^2 = 0.00092+0.0864a_{t-1}^2+0.852\sigma_{t-1}^2\]

如果均值非常数,需要回归拟合,使用ARMA+GARCH模型

这里没有对arma后的白噪声做过多的要求,直接根据aic判断

library(quantmod)
library(lattice)
library(timeSeries)
library(rugarch)
loadSymbols("^GSPC", from="1950-01-01")
spReturns = diff(log(Cl(GSPC)))
spReturns[as.character(head(index(Cl(GSPC)),1))] = 0
# Create the forecasts vector to store the predictions
windowLength = 500
foreLength = length(spReturns) - windowLength
forecasts <- vector(mode="character", length=foreLength)

for (d in 0:foreLength) {
    # Obtain the S&P500 rolling window for this day
    spReturnsOffset = spReturns[(1+d):(windowLength+d)]
    # Fit the ARIMA model
    final.aic <- Inf
    final.order <- c(0,0,0)
    for (p in 0:5) for (q in 0:5) {
        if ( p == 0 && q == 0) {
            next
        }
    #通过aic寻找最优的arma阶数
        arimaFit = tryCatch( arima(spReturnsOffset, order=c(p, 0, q)),
                             error=function( err ) FALSE,
                             warning=function( err ) FALSE )

        if( !is.logical( arimaFit ) ) {
            current.aic <- AIC(arimaFit)
            if (current.aic < final.aic) {
                final.aic <- current.aic
                final.order <- c(p, 0, q)
                final.arima <- arima(spReturnsOffset, order=final.order)
            }
        } else {
            next
        }
    }
    #对残差拟合garch模型
    spec = ugarchspec(
        variance.model=list(garchOrder=c(1,1)),
        mean.model=list(armaOrder=c(final.order[1], final.order[3]), include.mean=T),
        distribution.model="sged"
    )
    fit = tryCatch(
      ugarchfit(
        spec, spReturnsOffset, solver = 'hybrid'
      ), error=function(e) e, warning=function(w) w
    )

    # If the GARCH model does not converge, set the direction to "long" else
    # choose the correct forecast direction based on the returns prediction
    # Output the results to the screen and the forecasts vector
    if(is(fit, "warning")) {
      forecasts[d+1] = paste(index(spReturnsOffset[windowLength]), 1, sep=",")
      print(paste(index(spReturnsOffset[windowLength]), 1, sep=","))
    } else {
      fore = ugarchforecast(fit, n.ahead=1)
      ind = fore@forecast$seriesFor
      forecasts[d+1] = paste(colnames(ind), ifelse(ind[1] < 0, -1, 1), sep=",")
      print(paste(colnames(ind), ifelse(ind[1] < 0, -1, 1), sep=",")) 
    }
}

# Output the CSV file to "forecasts.csv"
write.csv(forecasts, file="forecasts.csv", row.names=FALSE)

# Input the Python-refined CSV file
spArimaGarch = as.xts( 
  read.zoo(
    file="forecasts_new.csv", format="%Y-%m-%d", header=F, sep=","
  )
)
# Create the ARIMA+GARCH returns
spIntersect = merge( spArimaGarch[,1], spReturns, all=F )
spArimaGarchReturns = spIntersect[,1] * spIntersect[,2]

# Create the backtests for ARIMA+GARCH and Buy & Hold
spArimaGarchCurve = log( cumprod( 1 + spArimaGarchReturns ) )
spBuyHoldCurve = log( cumprod( 1 + spIntersect[,2] ) )
spCombinedCurve = merge( spArimaGarchCurve, spBuyHoldCurve, all=F )

# Plot the equity curves
xyplot( 
  spCombinedCurve,
  superpose=T,
  col=c("darkred", "darkblue"),
  lwd=2,
  key=list( 
    text=list(
      c("ARIMA+GARCH", "Buy & Hold")
    ),
    lines=list(
      lwd=2, col=c("darkred", "darkblue")
    )
  )
)

其他:https://rpubs.com/englianhu/binary-Q1FiGJRGARCH

第四章条件异方差作业

  1. 查找格力电器(000651)日收盘价,并尝试用ARCH模型对其波动率建模。
da <- read.csv('~/Desktop/000651.csv',fileEncoding = "GBK")
library(lubridate)
ts.stock <- zoo(da[,c('收盘价')],
                as.POSIXct(da$日期))
plot(as.xts(ts.stock), type="l", 
     multi.panel=TRUE, theme="white",
     major.ticks="years",
     grid.ticks.on = "years")

ts.stock <- as.xts(tail(ts.stock,500))
ts.plot(ts.stock)#上升趋势

acf(ts.stock) 

pacf(ts.stock) #acf拖尾,pacf皆尾

forecast::auto.arima(ts.stock)
## Series: ts.stock 
## ARIMA(1,1,2) with drift 
## 
## Coefficients:
##          ar1      ma1     ma2   drift
##       0.7970  -1.1364  0.1616  0.0501
## s.e.  0.0455   0.0630  0.0569  0.0279
## 
## sigma^2 estimated as 22.15:  log likelihood=-1479.45
## AIC=2968.9   AICc=2969.02   BIC=2989.96
#arima(1,1,2)
model <- arima(ts.stock,order=c(5,1,3))
model
## 
## Call:
## arima(x = ts.stock, order = c(5, 1, 3))
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      ar5      ma1     ma2      ma3
##       1.0996  -0.8540  0.4276  -0.0236  -0.0129  -1.4495  1.1917  -0.6777
## s.e.  0.3374   0.3378  0.0970   0.0808   0.0966   0.3350  0.4464   0.1544
## 
## sigma^2 estimated as 21.39:  log likelihood = -1472.69,  aic = 2963.39
Box.test(model$residuals) #白噪声
## 
##  Box-Pierce test
## 
## data:  model$residuals
## X-squared = 0.012938, df = 1, p-value = 0.9094
Box.test(model$residuals^2) #非白噪声
## 
##  Box-Pierce test
## 
## data:  model$residuals^2
## X-squared = 47.769, df = 1, p-value = 4.796e-12
#进一步对波动进行arch建模
res <- model$residuals

ARCH模型可以用acf,pacf初步判断阶数。

acf(res^2,lag.max=36,main='')

#在12处较高,另外在之后1,2,3上较突出
#残差平方和的pacf
pacf(res^2,lag.max=36,main="")

使用fGarch包的garchFit()函数建立ARCH模型。

library(fGarch)
mod1 <- garchFit( ~ 1+arma(5,1,3)+ garch(2,0), data=ts.stock, trace=FALSE)
#其中1表示均值方程是一个常数,
#输出结果中mu表示均值方程的均值,omega表示alpha0,alpha1为alpha1
summary(mod1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~1 + arma(5, 1, 3) + garch(2, 0), data = ts.stock, 
##     trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ 1 + arma(5, 1, 3) + garch(2, 0)
## <environment: 0x7ff74c456388>
##  [data = ts.stock]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##       mu       ar1       ar2       ar3       ar4       ar5     omega    alpha1  
## -0.62940   1.00000  -0.76323   0.70764   0.72471  -0.66544   1.82423   1.00000  
##   alpha2  
##  1.00000  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu      -0.62940     0.65074   -0.967 0.333436    
## ar1      1.00000     0.07707   12.976  < 2e-16 ***
## ar2     -0.76323     0.08582   -8.894  < 2e-16 ***
## ar3      0.70764     0.09704    7.292 3.05e-13 ***
## ar4      0.72471     0.10167    7.128 1.02e-12 ***
## ar5     -0.66544     0.07975   -8.344  < 2e-16 ***
## omega    1.82423     0.52616    3.467 0.000526 ***
## alpha1   1.00000     0.08155   12.262  < 2e-16 ***
## alpha2   1.00000     0.09491   10.536  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  -1302.881    normalized:  -2.605762 
## 
## Description:
##  Mon Mar  8 21:14:35 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value  
##  Jarque-Bera Test   R    Chi^2  328093.6  0        
##  Shapiro-Wilk Test  R    W      0.5015763 0        
##  Ljung-Box Test     R    Q(10)  11.96909  0.2871305
##  Ljung-Box Test     R    Q(15)  14.70098  0.473163 
##  Ljung-Box Test     R    Q(20)  19.84824  0.4674582
##  Ljung-Box Test     R^2  Q(10)  0.1092745 1        
##  Ljung-Box Test     R^2  Q(15)  0.1653461 1        
##  Ljung-Box Test     R^2  Q(20)  0.2268573 1        
##  LM Arch Test       R    TR^2   0.124615  1        
## 
## Information Criterion Statistics:
##      AIC      BIC      SIC     HQIC 
## 5.247523 5.323386 5.246890 5.277292
resi <- residuals(mod1, standardize=TRUE)
#标准化残差的时序图
plot(ts(resi, start=start(ts.stock), frequency=frequency(ts.stock)), 
     xlab="年", ylab="标准化残差")

#标准化残差的acf
acf(resi, lag.max=36, main="")

acf(resi^2, lag.max=36, main="")

pacf(resi^2, lag.max=36, main="")

#没有了波动率相关。

4.查找沪深300指数日数据,并尝试用ARCH模型对其波动率建模。

da <- read.csv('~/Desktop/399300.csv',fileEncoding = "GBK")
library(lubridate)
ts.stock <- zoo(da[,c('收盘价')],
                as.POSIXct(da$日期))
plot(as.xts(ts.stock), type="l", 
     multi.panel=TRUE, theme="white",
     major.ticks="years",
     grid.ticks.on = "years")

ts.stock <- as.xts(tail(ts.stock,500))
ts.stock %>% ggtsdisplay()

ts.stock %>% diff() %>% ggtsdisplay()

ts.stock %>% log() %>% ggtsdisplay()

#上升趋势,acf拖尾,pacf截尾
forecast::auto.arima(ts.stock)
## Series: ts.stock 
## ARIMA(3,1,1) with drift 
## 
## Coefficients:
##           ar1     ar2     ar3     ma1   drift
##       -0.7462  0.0368  0.0986  0.8037  4.7986
## s.e.   0.1151  0.0561  0.0490  0.1088  2.6901
## 
## sigma^2 estimated as 2912:  log likelihood=-2695.75
## AIC=5403.49   AICc=5403.66   BIC=5428.77
model <- arima(ts.stock,order=c(3,1,1))
model
## 
## Call:
## arima(x = ts.stock, order = c(3, 1, 1))
## 
## Coefficients:
##           ar1     ar2     ar3     ma1
##       -0.7348  0.0481  0.1061  0.7989
## s.e.   0.1150  0.0559  0.0486  0.1089
## 
## sigma^2 estimated as 2901:  log likelihood = -2697.31,  aic = 5404.61
Box.test(model$residuals) #白噪声
## 
##  Box-Pierce test
## 
## data:  model$residuals
## X-squared = 0.013309, df = 1, p-value = 0.9082
Box.test(model$residuals^2) #非白噪声
## 
##  Box-Pierce test
## 
## data:  model$residuals^2
## X-squared = 5.5184, df = 1, p-value = 0.01882
#进一步对波动进行arch建模
res <- model$residuals

ARCH模型可以用acf,pacf初步判断阶数。

acf(res^2,lag.max=36,main='')

#在12处较高,另外在之后1,2,3上较突出
#残差平方和的pacf
pacf(res^2,lag.max=36,main="")

使用fGarch包的garchFit()函数建立ARCH模型。

library(fGarch)
mod1 <- garchFit( ~ 1+arma(3,1,1)+ garch(1,0), data=ts.stock, trace=FALSE)
#其中1表示均值方程是一个常数,
#输出结果中mu表示均值方程的均值,omega表示alpha0,alpha1为alpha1
#根据系数的显著性做调整
summary(mod1)
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~1 + arma(3, 1, 1) + garch(1, 0), data = ts.stock, 
##     trace = FALSE) 
## 
## Mean and Variance Equation:
##  data ~ 1 + arma(3, 1, 1) + garch(1, 0)
## <environment: 0x7ff74c0a60c0>
##  [data = ts.stock]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##          mu          ar1          ar2          ar3        omega       alpha1  
##   13.990177     1.000000     0.010294    -0.012689  2269.843557     0.268760  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu       13.99018    19.09086    0.733  0.46367    
## ar1       1.00000     0.05943   16.828  < 2e-16 ***
## ar2       0.01029     0.06513    0.158  0.87441    
## ar3      -0.01269     0.04427   -0.287  0.77438    
## omega  2269.84356   185.78754   12.217  < 2e-16 ***
## alpha1    0.26876     0.08187    3.283  0.00103 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  -2695.309    normalized:  -5.390618 
## 
## Description:
##  Mon Mar  8 21:14:38 2021 by user:  
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  207.0629  0           
##  Shapiro-Wilk Test  R    W      0.9589577 1.391679e-10
##  Ljung-Box Test     R    Q(10)  14.35463  0.1574218   
##  Ljung-Box Test     R    Q(15)  19.44239  0.1943819   
##  Ljung-Box Test     R    Q(20)  23.17653  0.28021     
##  Ljung-Box Test     R^2  Q(10)  29.10891  0.001196442 
##  Ljung-Box Test     R^2  Q(15)  37.02431  0.001255208 
##  Ljung-Box Test     R^2  Q(20)  38.56193  0.00755475  
##  LM Arch Test       R    TR^2   25.38049  0.01311926  
## 
## Information Criterion Statistics:
##      AIC      BIC      SIC     HQIC 
## 10.80524 10.85581 10.80495 10.82508
resi <- residuals(mod1, standardize=TRUE)
#标准化残差的时序图
plot(ts(resi, start=start(ts.stock), frequency=frequency(ts.stock)), 
     xlab="年", ylab="标准化残差")

#标准化残差的acf
acf(resi, lag.max=36, main="")

acf(resi^2, lag.max=36, main="")

pacf(resi^2, lag.max=36, main="")

#没有了波动率相关。

使用rugarch包进行garch,便于一起预测arma和garch

library(rugarch)
model.garch = ugarchspec(mean.model=list(armaOrder=c(3,1,1),include.mean=TRUE),
                         variance.model=list(model='sGARCH',garchOrder=c(1,0)))
#model Valid models (currently implemented) are “sGARCH”, “fGARCH”, “eGARCH”, “gjrGARCH”, “apARCH” and “iGARCH” and “csGARCH”.
model.garch.fit = ugarchfit(data=ts.stock, spec=model.garch )
model.garch.fit
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,0)
## Mean Model   : ARFIMA(3,0,1)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##           Estimate  Std. Error    t value Pr(>|t|)
## mu     3099.700409    1.983654  1562.6212 0.000000
## ar1       0.099566    0.052889     1.8826 0.059761
## ar2       0.954577    0.007328   130.2577 0.000000
## ar3      -0.047632    0.046344    -1.0278 0.304050
## ma1       1.000000    0.000025 39994.3468 0.000000
## omega  2265.939234  187.502340    12.0849 0.000000
## alpha1    0.236764    0.074350     3.1844 0.001450
## 
## Robust Standard Errors:
##           Estimate  Std. Error    t value Pr(>|t|)
## mu     3099.700409    2.081529  1489.1460 0.000000
## ar1       0.099566    0.042965     2.3174 0.020484
## ar2       0.954577    0.005886   162.1868 0.000000
## ar3      -0.047632    0.037819    -1.2595 0.207856
## ma1       1.000000    0.000027 37074.4144 0.000000
## omega  2265.939234  337.919059     6.7056 0.000000
## alpha1    0.236764    0.117688     2.0118 0.044241
## 
## LogLikelihood : -2689.533 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       10.786
## Bayes        10.845
## Shibata      10.786
## Hannan-Quinn 10.809
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                          statistic p-value
## Lag[1]                      0.0134  0.9078
## Lag[2*(p+q)+(p+q)-1][11]    5.0447  0.9498
## Lag[4*(p+q)+(p+q)-1][19]    8.8548  0.6626
## d.o.f=4
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                    0.09406  0.7591
## Lag[2*(p+q)+(p+q)-1][2]   0.42373  0.7305
## Lag[4*(p+q)+(p+q)-1][5]   4.28760  0.2203
## d.o.f=1
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[2]    0.6541 0.500 2.000 0.41865
## ARCH Lag[4]    5.0922 1.397 1.611 0.08359
## ARCH Lag[6]    9.1950 2.222 1.500 0.02158
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.7898
## Individual Statistics:              
## mu     0.12571
## ar1    0.05968
## ar2    0.09902
## ar3    0.05362
## ma1    0.29883
## omega  0.30924
## alpha1 0.11060
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.69 1.9 2.35
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.2952 0.7680    
## Negative Sign Bias  0.4298 0.6675    
## Positive Sign Bias  0.8137 0.4162    
## Joint Effect        1.0815 0.7815    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     38.80     0.004688
## 2    30     45.16     0.028388
## 3    40     60.16     0.016358
## 4    50     74.40     0.011110
## 
## 
## Elapsed time : 0.592417
#alpha1
resi <- residuals(model.garch.fit)
checkresiduals(ts(resi)) #比较正态的

Box.test(resi)
## 
##  Box-Pierce test
## 
## data:  resi
## X-squared = 0.52614, df = 1, p-value = 0.4682
Box.test(resi^2)
## 
##  Box-Pierce test
## 
## data:  resi^2
## X-squared = 6.954, df = 1, p-value = 0.008363
  • 预测

样本外预测

ugarchforecast(model.garch.fit,n.ahead=3)
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 3
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=2021-01-07]:
##     Series Sigma
## T+1   5536 65.12
## T+2   5536 57.18
## T+3   5553 55.14

样本内预测

out_of_sample <- round(length(ts.stock)/2)
dates_out_of_sample <- tail(index(ts.stock), out_of_sample)

garch_fit <- ugarchfit(spec = model.garch, data = ts.stock, out.sample = out_of_sample)
coef(garch_fit)
##           mu          ar1          ar2          ar3          ma1        omega 
## 3.110350e+03 1.889647e-01 7.442907e-01 6.876422e-02 8.579572e-01 1.984805e+03 
##       alpha1 
## 4.397274e-02
garch_fore <- ugarchforecast(garch_fit, n.ahead = 1, n.roll = out_of_sample-1)
forecast_value <- xts(garch_fore@forecast$seriesFor[1, ], dates_out_of_sample)
forecast_volatility <- xts(garch_fore@forecast$sigmaFor[1, ], dates_out_of_sample)

# plot of ts.stock value
plot(cbind("fitted"   = fitted(garch_fit),
           "forecast" = forecast_value,
           "original" = ts.stock), 
     col = c("blue", "red", "black"), lwd = c(0.5, 0.5, 2),
     main = "Forecast of ts.stock", legend.loc = "topleft") #拟合不错

# plot of volatility
plot(cbind("fitted volatility"   = sigma(garch_fit),
           "forecast volatility" = forecast_volatility,
           "log-returns"         = log(ts.stock)), 
     col = c("blue", "red", "black"), lwd = c(2, 2, 1),
     main = "Forecast of volatility of synthetic log-returns", legend.loc = "topleft")

R Session Information

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] rugarch_1.4-4       lubridate_1.7.9     fGarch_3042.83.2   
##  [4] fBasics_3042.89.1   timeSeries_3062.100 timeDate_3043.102  
##  [7] xts_0.12.1          zoo_1.8-8           readr_1.4.0        
## [10] forecast_8.13       dplyr_1.0.2        
## 
## loaded via a namespace (and not attached):
##  [1] mclust_5.4.7                Rcpp_1.0.6                 
##  [3] mvtnorm_1.1-1               lattice_0.20-41            
##  [5] digest_0.6.27               lmtest_0.9-38              
##  [7] truncnorm_1.0-8             R6_2.5.0                   
##  [9] evaluate_0.14               ggplot2_3.3.2              
## [11] pillar_1.4.6                rlang_0.4.10               
## [13] spatial_7.3-12              curl_4.3                   
## [15] fracdiff_1.5-1              TTR_0.24.2                 
## [17] nloptr_1.2.2.2              SkewHyperbolic_0.4-0       
## [19] Matrix_1.2-18               rmarkdown_2.5              
## [21] labeling_0.4.2              stringr_1.4.0              
## [23] munsell_0.5.0               compiler_4.0.3             
## [25] numDeriv_2016.8-1.1         xfun_0.18                  
## [27] DistributionUtils_0.6-0     pkgconfig_2.0.3            
## [29] urca_1.3-0                  htmltools_0.5.1.1          
## [31] nnet_7.3-14                 Rsolnp_1.16                
## [33] tidyselect_1.1.0            tibble_3.0.4               
## [35] bookdown_0.21               quadprog_1.5-8             
## [37] crayon_1.4.1                MASS_7.3-53                
## [39] GeneralizedHyperbolic_0.8-4 grid_4.0.3                 
## [41] nlme_3.1-149                gtable_0.3.0               
## [43] lifecycle_1.0.0             magrittr_2.0.1             
## [45] scales_1.1.1                rmdformats_0.3.7           
## [47] KernSmooth_2.23-17          quantmod_0.4.17            
## [49] stringi_1.5.3               farver_2.0.3               
## [51] tseries_0.10-47             ellipsis_0.3.1             
## [53] generics_0.0.2              vctrs_0.3.4                
## [55] spd_2.0-1                   tools_4.0.3                
## [57] glue_1.4.2                  purrr_0.3.4                
## [59] hms_0.5.3                   ks_1.11.7                  
## [61] yaml_2.2.1                  colorspace_1.4-1           
## [63] knitr_1.30